National Repository of Grey Literature 3 records found  Search took 0.01 seconds. 
Filtering of Texts Extracted from PDF, OCR or Web
Lehnert, Filip ; Plchot, Oldřich (referee) ; Szőke, Igor (advisor)
The objective of this thesis is to implement a set of scripts to improve the transfer of various types of documents into fully text. There appears noise and not entirely correct character conversion by converting various file formats. These scripts extracted text file cleans so that the resulting text is readable, make sense and does not contain any residues of various characters appearing by the transfer of graphs, tables, formulas, etc. The script works universally and does not require input solely by OCR tools or converting from PDF or web.
Filtering of Texts Extracted from PDF, OCR or Web
Lehnert, Filip ; Plchot, Oldřich (referee) ; Szőke, Igor (advisor)
The objective of this thesis is to implement a set of scripts to improve the transfer of various types of documents into fully text. There appears noise and not entirely correct character conversion by converting various file formats. These scripts extracted text file cleans so that the resulting text is readable, make sense and does not contain any residues of various characters appearing by the transfer of graphs, tables, formulas, etc. The script works universally and does not require input solely by OCR tools or converting from PDF or web.
Methodology and problems of data transformation and determine its importance in the integration of heterogeneous information sources
Bartoš, Ivan ; Papík, Richard (advisor) ; Dvořák, Jan (referee) ; Bureš, Miroslav (referee)
Methodology and issues of data transformation and its information value estimation during the integration of the heterogenous information sources PhDr. Ivan BARTOŠ Abstract This study focuses mainly on the data and information transformation issue. This topic is currently critical in several scientific and commercial areas. Information value, information quality and the quality of the source data differs between the various systems. This is not only due to the different topologies of the information sources but also because of its different understanding and a manner of storing the information describing the entity of the enterprise. Such information systems, respectively database systems in the scope of the thesis, could perform well as the stand alone systems. The issue appears in the moment when such heterogeneous systems are required to be integrated and the information shall be migrated between each other. The thesis is logically divided into four major parts based on these issues. The first part describes the methods that can be used to classify the data quality of the source system (the one to be integrated) from which the information can be extracted. Based on assumption of the common lack of project and system documentation hereby introduced methods can be used for such qualification even when the...

Interested in being notified about new results for this query?
Subscribe to the RSS feed.